Augmented Query Strategies for Active Learning in Stream Data Mining

نویسندگان

  • Mustafa Amir Faisal
  • Zeyar Aung
  • Wei Lee Woon
  • Davor Svetinovic
چکیده

Active learning is used in situations where the amount of unlabeled data is abundant but it is costly to manually label the data. So, depending on our available budget, from all unlabeled instances we are to select only a subset of them to ask the oracle for manual labeling. Thus, the query strategy, i.e., how relevant instances are selected to be sent to the oracle, plays an important role in active learning. Though active learning is a very established research area, only a few research works have been done on it in the context of stream data mining. Active learning for stream data is more challenging than for static data because the repetition of queries is not feasible as revisiting the data is almost impossible. In this paper, we propose two augmented query strategies for active learning in stream data mining, namely, Margin Sampling with Variable Uncertainty (MSVU) and Entropy Sampling with Uncertainty using Randomization (ESUR). These two strategies are derived and improved from the existing methods of Variable Uncertainty (VU) and Uncertainty using Randomization (UR) respectively. We evaluate the effectiveness of our proposed MSVU and ESUR strategies by comparing them against the original VU and UR on 6 different datasets using two base classifiers: Leveraging Bagging (LB) and Single Classifier Drift (SCD). Experimental results show that our proposed strategies offer promising outcomes for various datasets and detecting concept drift in the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Active Learning for Concept Prerequisite Learning

Concept prerequisite learning focuses on machine learning methods for measuring the prerequisite relation among concepts. With the importance of prerequisites for education, it has recently become a promising research direction. A major obstacle to extracting prerequisites at scale is the lack of large scale labels which will enable effective data driven solutions. We investigate the applicabil...

متن کامل

Stream-Based Active Unusual Event Detection

We present a new active learning approach to incorporate human feedback for on-line unusual event detection. In contrast to most existing unsupervised methods that perform passive mining for unusual events, our approach automatically requests supervision for critical points to resolve ambiguities of interest, leading to more robust and accurate detection on subtle unusual events. The active lea...

متن کامل

Web Services for Stream Mining: A Stream-Based Active Learning Use Case

The nature of data on the Web is becoming more and more streamoriented and in this context, the idea of mining Web-generated data streams is becoming a hot topic. Web services, on the other hand, have become an inevitable tool for the future development of the Web. While Web services have been very successful in providing distributed computing environments, they have not been exploited for buil...

متن کامل

Active learning with support vector machines

In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying dat...

متن کامل

Online Passive Aggressive Active Learning and Its Applications

We investigate online active learning techniques for classification tasks in data stream mining applications. Unlike traditional learning approaches (either batch or online learning) that often require to request the class label of each incoming instance, online active learning queries only a subset of informative incoming instances to update the classification model, which aims to maximize cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014